The mathematical expectation, denoted as $E(X)$ or $\mu_X$, serves as the fundamental measure of central tendency for a random variable. It represents the "long-run average" value obtained over repeated trials. Physically, it is the center of mass of a probability distribution, calculated as the probability-weighted sum of all possible outcomes.
Formal Definitions
For discrete random variables, we define the expected value based on the Probability Mass Function (PMF):
Definition 3.1.1
Let $X$ be a discrete random variable. The expected value is:
$$E(X) = \sum_{x \in R^1} x P(X = x) = \sum_{x \in R^1} x p_X(x)$$
Definition 3.1.2
If $X$ takes distinct values $x_1, x_2, \dots$ with probabilities $p_i$, then:
$$E(X) = \sum_i x_i p_i$$
The Law of the Unconscious Statistician (LOTUS)
To find the expectation of a transformed variable $g(X)$, we do not need to derive the density of $g(X)$ first.
Theorem 3.1.1 (LOTUS)
For any function $g$, the expected value of $g(X)$ is the sum of the function values weighted by the original probabilities:
$E(g(X)) = \sum_{x} g(x) P(X=x)$
Core Properties
- Linearity (Theorem 3.1.2): $E(aX + bY) = aE(X) + bE(Y)$. This holds even if $X$ and $Y$ are dependent!
- Monotonicity (Theorem 3.1.4): If $X(s) \le Y(s)$ for all outcomes $s$, then $E(X) \le E(Y)$.
- Independence (Theorem 3.1.3): If $X$ and $Y$ are independent, $E(XY) = E(X)E(Y)$.
Example 3.1.6: Indicators
For an indicator function $I_A$, where $X=1$ if $A$ occurs and $0$ otherwise:
$E(I_A) = (1)P(A) + (0)P(A^c) = P(A)$